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Abstract 
Satellite multisensor precipitation products (SMPPs) have a variety of potential uses, but 
suffer from relatively poor accuracy due to systematic biases and random errors in 
precipitation occurrence and magnitude. We use the Censored Shifted Gamma Distribution 
(CSGD) to characterize the Tropical Rainfall Measurement Mission Multi-Satellite 
Precipitation Analysis (TMPA), a commonly-used SMPP, and to compare it against the 
rain gage-based North American Land Data Assimilation System Phase 2 (NLDAS-2) 
reference precipitation dataset across the conterminous United States. The CSGD describes 
both the occurrence and the magnitude of precipitation. Climatological CSGD 
characterization reveals significant regional differences between TMPA and NLDAS-2 in 
terms of magnitude and probability of occurrence. We also use a flexible CSGD-based 
error modeling framework to quantify errors in TMPA relative to NLDAS-2. The 
framework can model conditional bias as either a linear or nonlinear function of satellite 
precipitation rate and can produce a “conditional CSGD” of describing the distribution of 
“true” precipitation based on a satellite observation. The framework is also used to “merge” 
TMPA with atmospheric variables from Modern-Era Retrospective analysis for Research 
and Applications (MERRA-2) to reduce SMPP errors. Despite the coarse resolution of 
MERRA-?, this merging offers robust reductions in random error due to the better 
performance of numerical models in resolving stratiform precipitation. Improvements in 
the near-realtime version of TMPA are relatively greater than for the higher-latency 


research version. 
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1. Introduction 


Precipitation data is critical in a variety of subjects including climate studies, meteorology, 
hydrology, and natural hazards. While precipitation is relatively easy to measure at a single 
point using a rain gage, measurement over large regions at high spatial and temporal 
resolution is a major challenge. A “constellation” of earth-observing satellite missions, 
including the now-defunct Tropical Rainfall Measuring Mission (TRMM) and the follow- 
on Global Precipitation Measurement (GPM) mission, co-led by the National Aeronautics 
and Space Administration (NASA) and the Japan Aerospace Exploration Agency. These 
satellites provide a mix of direct measurements of precipitation and related processes using 
active radar and indirect measurements using passive microwave (PMW), and infrared 
(IR). Satellite multisensor precipitation products (SMPPs) merge these various 
observations to create near-global precipitation records that approach two decades in 
length. Examples include the 3-hourly, 0.25° Tropical Rainfall Measurement Mission 
Multi-Satellite Precipitation Analysis (TRMM TMPA; Huffman et al., 2010, 2007); the 
30-minute, 8 km Climate Prediction Center (CPC) Morphing Technique (CMORPH; Joyce 
et al., 2004); and the hourly, 4 km Precipitation Estimation from Remote Sensing 
Information using Artificial Neural Networks (PERSIANN; Sorooshian et al., 2000). Most 
SMPPs are available in near-realtime (with latency on the order of several hours) and some 
have non-realtime variants that utilize ground-based rain gage information for bias 
correction. Launched in 2014, the GPM mission builds on TRMM’s legacy with an 
advanced active and passive instrument package. NASA’s 30-minute, 0.1° Integrated 


Multi-satellitE Retrievals for GPM (IMERG; Huffman et al., 2014) dataset builds on more 
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than a decade of experience with SMPPs, combining the strengths of TMPA, CMORPH, 


and PERSIANN and incorporating additional improvements. 


Despite widespread interest in SMPPs, these datasets often exhibit considerable errors, 
both systematic (i.e. bias) and random, stemming from a variety of sources. Observation 
quality varies within the satellite constellation, with active radar being the most accurate, 
followed by PMW and IR. Sensor technology and resolution varies with age and mission. 
The current constellation of satellites provides a PMW observation for most locations on 
Earth approximately every three hours, while radar observations are much less frequent. 
Between PMW measurements, algorithms typically use spatiotemporal interpolation of 
PMW or “infilling” using lower-accuracy IR. PMW observations tend to be more accurate 
nearer the tropics and for convective than for stratiform storm systems (Ebert et al., 2007) 
and are influenced by the underlying land or water surface, and microwave emissions from 
snow or ice-covered ground can be difficult to distinguish from emissions due to ice scatter 
in precipitating clouds (Ferraro et al., 2013; Ringerud et al., 2014; Tian and Peters-Lidard, 
2007). IR and PMW instruments have difficulties with orographic precipitation systems 
due to their shallow nature (Shige et al., 2013) and high variability in microscale and 


macroscale dynamics (Anders et al., 2007). 


Given the potential usefulness of SMPPs, it is natural to want to characterize SMPP errors 


Rid 


using an error model that compares SMPP against “ground truth,” i.e. more reliable 
reference data (typically rain gages or ground-based weather radar). Systematic error is 


usually heteroscedastic (i.e. depends on precipitation observation magnitude), a 
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phenomena known as conditional bias (Ciach et al., 2000). Such errors tend to be 
multiplicative (Tian et al., 2013) with a magnitude that increases with precipitation 
observation intensity. Error models can be used to identify and thus remove systematic 
errors. They can also describe the statistical distribution of random errors, which can be 
understood as the residuals once the systematic error has been removed. Using this 
approach, individual random errors are irreducible without some sort of additional 


explanatory information. 


SMPP characterization efforts (e.g. AghaKouchak et al., 2011; Behrangi et al., 2011; Tian 
et al., 2009) often distinguish between three error “cases”: false alarms, in which the SMPP 
reports precipitation while the reference data does not; misses, in which the reference 
reports precipitation while the SMPP does not; and hits, in which both report precipitation, 
but not necessarily of the same magnitude. Most error models that have been developed in 
the context of precipitation estimation using ground-based radar (AghaKouchak et al., 
2010; Ciach et al., 2007; Germann et al., 2009) and SMPP (Gebremichael et al., 201 1a; 


Sarachi et al., 2015; Yan and Gebremichael, 2009) have tended to focus on hit cases only. 


Several previous SMPP error models have considered false alarms, misses, and hits 
separately, and then recombine these separate descriptions to create an overall estimated 
distribution of true precipitation. For example, the Precipitation Uncertainties for Satellite 
Hydrology framework (PUSH) introduced by Maggioni et al. (2014) uses a Gamma 
distribution to describe the precipitation intensity associated with misses, exponential 


decay and linear regression models respectively to describe the probability and intensity 
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associated with false alarms, and a generalized linear model to generate a Gamma 
distribution of precipitation magnitude associated with hits. PUSH also uses a uniform 
distribution to describe possible trace precipitation associated with cases where neither the 
SMPP nor reference data report precipitation. For any zero or nonzero SMPP observation, 
a probability distribution can be generated by combining these cases. The two-dimensional 
Satellite Rainfall Error Model (SREM2D) introduced by (Hossain and Anagnostou, 2006) 
takes a somewhat similar approach, but incorporates spatial and temporal autocorrelation 


functions to construct ensembles of correlated precipitation fields. 


This study applies a new shifted gamma distribution (CSGD) methodology to characterize 
precipitation and create an SMPP error model that produces a “best guess” distribution of 
the true precipitation by considering hits, misses, and false alarms. The CSGD technique 
presented in this paper is arguably simpler than most, and comparison with the PUSH error 


model that suggests that this relative simplicity is advantageous. 


Previous precipitation error model studies have generally focused on relatively small 
geographic areas where spatial stationarity of rainfall and model parameters can be 
assumed; however, these approaches have not explored spatial variability in these 
parameters or in model performance. This study is one of the few, along with Maggioni et 
al. (2016), that applies an error model over a large region to better understand SMPP 
performance characteristics and how they are tied to physiographic and climatological 


features. 
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This study moves beyond the traditional notions of precipitation error modeling towards 
error correction by allowing the incorporation of additional information to reduce random 
errors. Previous researchers have suggested that topography and other land surface 
characteristics as well as other atmospheric variables such as humidity could help 
understand and, in principle, correct SMPPs (Gebregiorgis and Hossain, 2013; 
Gebremichael et al., 2011a). As far as we are aware, this study is the first to explore the 
potential benefit of incorporating atmospheric variables such as humidity and precipitation 
from numerical weather models (specifically, atmospheric reanalysis) in a satellite 
precipitation error model to reduce SMPP random errors. This is a promising approach 
since the complementary performances of numerically-simulated and remotely-sensed 
precipitation estimates provide the opportunity to produce merged datasets with smaller 


systematic and random errors. 


The SMPP, ground reference, and atmospheric reanalysis datasets utilized in this study are 
described in Section 2. The CSGD and the CSGD-based precipitation error modeling and 
correction frameworks are introduced in Section 3. Results for precipitation 
characterization and SMPP error modeling are provided in Section 4. Summary and closing 


discussion follow in Section 5. 


2 Data 
This study focuses on daily-scale, 0.25° (approximately 25 km) precipitation over the 
conterminous United States (CONUS; see Figure 1). This large geographic extent allows 


us to robustly demonstrate not only how the CSGD can be used to characterize precipitation 
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and how the CSGD-based error modeling framework can correct for biases and 
characterize remaining uncertainties, but also how these features vary with climatic and 


physiographic controls. 


We examine two variants of TMPA (also known as TRMM 3B42) Version 7.0. TMPA 
merges PMW, active radar, and IR observations from multiple satellites to create a near- 
global (+50° latitude) rainfall dataset with 3-hourly, 0.25° resolution. The “research” 
version includes a monthly rain gage-based bias correction and is available approximately 
two months after realtime. In this study, analyses using this version cover 1998-2014. 
Several analyses consider TMPA-RT, which is available approximately 8 hours after 
realtime and only includes a gage-based climatology correction. Such near-realtime 
analyses cover 2000-2014, since the pre-2000 TRMM orbit precludes near-realtime 
analysis. “TMPA” is used to refer to the research version and “TMPA-RT” for the near- 
realtime version. The TRMM satellite ceased operations in April 2015 but the TMPA 
product is continuing to be produced leveraging other satellites in the constellation. 
NASA’s recent IMERG SMPP was not used in this study, since at the time of writing it 


was only available for 2014 onward. 


We use the “File A” precipitation forcing from Phase 2 of NASA’s National Land Data 
Assimilation System (NLDAS-2; Xia et al., 2012b, 2012a) as the reference. NLDAS-2 
precipitation has hourly, 0.125° resolution, disaggregated from daily CPC-Unified gage 
analysis (Chen et al., 2008; Xie et al., 2007) and features a statistical topographic correction 


based on the PRISM climatology by Daly et al. (1994). NLDAS-2 was selected rather than 
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the Stage IV bias-corrected radar rainfall dataset that has been used in some SMPP 
validation studies (AghaKouchak et al., 2011; Qiao et al., 2014) since visual inspection of 
Stage IV revealed very poor performance in mountainous regions. We have aggregated 
NLDAS-? from its hourly 0.125° resolution to the same daily 0.25° resolution as the TMPA 
data. Thus, the NLDAS-2 precipitation values used in this study are very similar, but not 
exactly identical, to CPC-Unified, which has been used in several previous SMPP error 
characterizations (Maggioni et al., 2016, 2014; Tian et al., 2013). The reader is referred to 
Ferguson and Mocko (2017) for a detailed explanation of the data sources utilized to create 


the NLDAS-2 precipitation forcing. 


Though there is likely overlap in terms of the rain gages used to create NLDAS-2 and those 
used to bias-correct the research version of TMPA, the CSGD-based framework does not 
require strict independence of SMPP and reference data. This study assumes that NUDAS- 
2 is free of errors, which is of course never the case for any dataset, let alone a continental- 
scale one such as NLDAS-2. Rain gage undercatch errors in gridded rain gage datasets can 
be substantial, particularly for snowfall and for extreme rainfall (Adam and Lettenmaier, 
2003). NLDAS-2 does not use a gage undercatch correction, and thus probably 
underestimates true precipitation. It should be noted that the monthly gridded rain gage 
data used to bias correct TMPA does use an undercatch correction. Thorough investigation 
of the role of gage undercatch errors in satellite precipitation validation is beyond the scope 


of this study. 
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Wealso present analyses that utilize surface precipitation rate and vertically integrated total 
precipitable water (TPW) from Version 2 of the Modern-Era Retrospective analysis for 
Research and Applications (MERRA-?2; Bosilovich et al., 2015; Rienecker et al., 2011) 
from NASA. MERRA-? is generated using an atmospheric model that assimilates a range 
of surface and atmospheric observations including satellite PMW. MERRA-2 outputs have 
hourly, 0.5° latitude by 0.625° longitude resolution. It is unnecessary to regrid the 
MERRA-? datasets to the 0.25° resolution of TMPA for this study, but the same daily 
temporal resolution is used. Though MERRA-?2 provides several surface-level precipitation 
outputs, including a version primarily based on rain gages, we use model internally- 
generated precipitation to ensure greater independence from TMPA and NLDAS-2 and to 
illustrate the value of numerically-generated precipitation and other atmospheric variables 


for reducing SMPP errors. 


The precipitation datasets utilized in this study consider all seasons and precipitation 
phases (i.e. rain, snow, hail, etc.), represented in terms of depth of liquid water. 
Determination of precipitation phase is a challenge in gridded precipitation datasets, 
whether the underlying data come from rain gage networks, satellites, ground-based radar, 


or numerical models. 


We treat data prior to 2014 as the “training period,” i.e. used for model parameter 


estimation as well as error analysis. Data from 2014 is used as “validation,” to assess model 


robustness when used outside of the training period. Though this training period is much 
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longer than the validation period, this typifies many settings in which an error model might 


be used, since many reference datasets date at least as far back as most or all SMPPs. 


3. Methods 


aot The CSGD 
The two-parameter Gamma distribution has been used in precipitation modeling since at 
least Das (1955). Like precipitation itself, the Gamma distribution is left-bounded at zero, 
and can take many possible “shapes,” in terms of its density and cumulative distribution 
function (CDF). Generally, a precipitation process can be modeled in two steps using a 
total of three parameters. First, the probability of occurrence is modeled via a Bernoulli 
trial with the “success” parameter equal to the probability of precipitation (POP). Second, 
the nonzero precipitation magnitude is modeled via the two-parameter Gamma with shape 
parameter k and scale parameter 6 expressed using the distribution mean yu and standard 
deviation o by 

a (1) 

Lu 

The CSGD is an alternative formulation presented in (Scheuerer and Hamill, 2015) in 
which the CDF is “shifted” left and subsequently left-censored at zero, meaning all 
negative values are replaced by zero. Thus, the density to the left of zero represents the 
probability of zero precipitation (1 — POP), while the density to the right of zero represents 
the likelihood of a particular nonzero value. To achieve this, a “shift” parameter 6, 6 < 0 
is introduced such that, if F;, g denotes the CDF of a gamma distribution, then the CDF of 


the CSGD model is defined by 


11 


247 


248 


249 


250 


251 


252 


253 


254 


255 


256 


Ee | 


258 


259 


260 


261 


262 


263 


264 


265 


266 


267 


268 


Fi.9(x — 6) forx = 0 


Fx0.8 oS 0 forx <0 


(2) 


where x is rainfall depth. In this way, the CSGD eliminates the initial Bernoulli trial from 
the precipitation modeling process, though the introduction of 6 means the total number of 
parameters remains at three. Thus, while the conventional Gamma distribution has the 
property that F;, (0) = 0 (i.e. the CDF is equal to zero at zero rainfall depth), the CDF of 
a CSGD has the property F;95(0) = 1 — POP (see Figure 2). Scheuerer and Hamill 
(2015) provide details for CSGD parameter estimation based on minimization of the 
continuous ranked probability score, which essentially minimizes the integrated quadratic 


distance between the empirical and theoretical CSGD distribution functions. 


CDFs for “climatological CSGDs” (to distinguish from conditional CSGDs, described in 
Section 3.2) are shown for the 0.25° grid cells nearest to Charlotte, North Carolina and 
Denver, Colorado (top panel of Figure 3). These demonstrate good fit to the empirical 
CDFs, while highlighting the differences between locations and between TMPA and 


NLDAS-2. 


3.2. CSGD-Based Error Modeling and Correction Framework 

The climatological CSGD is insufficient for generating a distribution of estimated “true” 
precipitation values (or, equivalently, a distribution of SMPP errors) based on a given 
observation R,(t) at time t, since the mean y(t), standard deviation o(t), and perhaps 
shift 6(t) depend on the magnitude of R,(t). Thus, we use a CSGD-based error modeling 


framework to reduce systematic SMPP biases, and to model and reduce SMPP random 


12 


269 


270 


271 


272 


273 


274 


215 


276 


Zi 


278 


279 


280 


281 


282 


283 


284 


285 


286 


287 


288 


289 


errors. The framework was first introduced in Scheuerer and Hamill (2015) and further 
explored in (Baran and Nemoda (2016) for statistical post-processing of ensemble 
numerical precipitation forecasts. The CSGD-based approach uses a statistical regression 
model “trained” using a past record of contemporaneous satellite and reference 
observations. The regression model is then conditioned using a satellite observation for 
time ¢ to generate “conditional CSGD” parameters u(t), o(t), and 6(t) from the 


climatological CSGD parameters k, 6, and 6. 


In the simplest version, u(t) increases linearly with R,(t) and o(t) increases 
proportionally to the square root of u(t). Allowing 6(t) to vary offers little benefit and can 
lead to parameter estimation difficulties (M. Scheuerer, personal comm., February 27, 
2017). We will refer to this version as the “linear model,” since it models conditional bias 


linearly with precipitation rate. It has the form 


E(t) = H(a +05) (3) 
a(t) = ao | (4) 
S(t) =6 (5) 


where Rs denotes the mean of the SMPP time series. Example CDFs of conditional CSGDs 
are shown in the lower panel of Figure 3 for R,(t) values of 2.5 and 25 mm/d for the 0.25° 
grid cells nearest to Charlotte, North Carolina and Denver, Colorado. These show that as 
R,(t) increases, the probability of the true precipitation being zero decreases (approaching 


zero for R,(t)=25 mm/d) while the probability of higher true values increases. The value 
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of u(t) will always be nonzero and greater than conditional median at time ¢, which will 


be equal to zero when the conditional POP is less than 0.5. 


Scheuerer and Hamill (2015) also present a more complex version that can account for 
nonlinearity in conditional bias. This model, from now on will be called the nonlinear 


model, has the form 


L(t) = Elogip [expm1(a,) (a2 + a3 S| (6) 


where log1p(x) = log (1 + x) and expm1(x) = exp(x) - 1. 

The regression framework can also accommodate an arbitrary number n of additional 
contemporaneous covariates C,(t),C,(t),...,C,(t) such as TPW, temperature, or 
humidity from atmospheric observations or simulations. In this case, Equation 3 expands 


to 


u(t) =p (a, + O3 Rs 5. Oe eer! 
Rs Cy 


C2(t) 
ao 


C(t) 


t+ O44, = (7) 
Cy 


and C, is the mean of the time series of the ith covariate. A similar variant of the nonlinear 
model (Equation 6) could be written to include covariates. The inclusion of covariates 
allows for additional information to be introduced to the SMPP-reference intercomparison, 
allowing the explanation of some of the residual (i.e. random) error. We use the techniques 
described in Scheuerer and Hamill (2015) to estimate the parameters of the CSGD 


correction framework. 


The models described above are consistent with the notions that satellite errors are 


multiplicative (Tian et al., 2013) and that error magnitude grows with R,(t). They bear 
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passing resemblance to the PUSH model of Maggioni et al. (2014), in that the conditional 
distribution of estimated true precipitation Fy¢¢),9(¢),6(t) given R(t) is assumed to be 
Gamma distributed, though we use the 3-parameter CSGD rather than the conventional 2- 
parameter Gamma used to model precipitation hits in PUSH. This allows for the possibility 
of the estimated true precipitation to be zero, even if R,(t) > 0 (i.e. a false alarm) or vice 
versa (missed precipitation). PUSH, in contrast, accounts for false alarms and misses using 
separate models, making it impossible to construct a theoretical distribution for estimated 
true precipitation and involves additional parameters. Like PUSH, the CSGD framework 
has the advantage of being parametric, which can be helpful in conditions of very low or 


very high precipitation rates (Gebremichael et al., 2011b; Zhang et al., 2013). 


4, Results and Discussion 


4.1 CSGD-Based Precipitation Characterization 

Estimates of “, 0, and 6 for 1998-2013 for NUDAS-2 and TMPA are compared for every 
grid cell over CONUS (Figure 4). All three parameters in both TMPA and NLDAS-2 
exhibit higher values in the eastern United States and the Pacific coastal mountains than in 
the western United States. This should be expected due to the higher amounts of 
precipitation in these parts of the country (See Figure 1). TMPA tends to overestimate u 
and o and underestimate 6 relative to NLDAS-2 except in the pacific coastal and Rocky 
Mountains. Differences in w and o in the western United States are lower in magnitude, 
though the relative differences are approximately uniform except for over mountains. 
Isolated or small clusters of seemingly anomalous parameter values can be seen in TMPA 


but not in NLDAS-2. Visual inspection shows that these are co-located with water bodies 
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such as lakes and reservoirs that are known to influence PMW-based precipitation 


estimates (Tian and Peters-Lidard, 2007). 


POP cannot be evaluated directly from Figure 4. Over CONUS, POP for TMPA is more 
uniform and significantly lower than in NLDAS-2, suggesting that the precipitation 
detection limits imposed by the satellite sensors or processing algorithms exert strong 
controls (Figure 5). The TRMM sensor package was designed to detect moderate to heavy 
rainfall and thus tend to underestimate light precipitation and mixed phase/falling snow. 
GPM can see a much broader spectrum of precipitation. As with the parameter estimates 
in Figure 4, anomalous isolated POP values are co-located with water bodies. We do not 
explore this issue further in this study, but Maggioni et al. (2014) suggest that a minimum 
detection threshold of 0.25 mm/d may be a reasonable approximation in TMPA and their 
PUSH error model utilizes this threshold to distinguish between precipitation and non- 
precipitation. The linear and nonlinear conditional CSGD models described in Section 3.2 
do allow for nonzero true precipitation even when R,(t) = 0, and thus the CSGD approach 


need not explicitly consider detection thresholds. 


4.2 Error Modeling using the Conditional CSGD Framework 

Before showing CONUS-wide error modeling and correction results using the CSGD 
framework, we provide a more detailed illustration of the linear and nonlinear models and 
comparison with the PUSH model from Maggioni et al. (2014) for the 0.25° grid cell 
nearest to Charlotte, North Carolina (Figure 6). The models and data, including the 1998- 


2013 training period and 2014 validation period, are shown on both linear (left panels) and 
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logarithmic scales (right panels). For both Charlotte and other locations across CONUS, 
TMPA tends to overestimate at higher precipitation rates. This overestimation is consistent 
with previous studies (AghaKouchak et al., 2011; Tian et al., 2009) and may be due to the 
joint effect of TMPA’s monthly bias correction and poor light precipitation detection, 
which would tend to introduce a high bias in precipitation magnitude (Tian et al., 2009; 
Wright et al., 2017). However, since NLDAS-2 does not account for gage undercatch, it 
almost certainly underestimates true heavy precipitation to an unknown degree. Thus, the 
extent to which TMPA overestimates true precipitation for large events is difficult to assess 


without a more detailed reference dataset. 


The linear and nonlinear versions of the CSGD-based error model provide good fits to the 
data for both the training and validation periods, and the nonlinear variant better captures 
the nonlinearity in conditional bias that is evident in high precipitation. PUSH greatly 
overestimates conditional bias for high precipitation, and no points fall outside of the lower 
bound of that model’s 95% spread, which is unrealistic given the relatively large sample 
size. In contrast, approximately 5% of points fall outside of the 95% quantile spread for 
the CSGD model (note that not all data points are clearly visible in Figure 6, particularly 


those that fall very close to either axis). 


We evaluate a range of conditional CSGD error model complexities; specifically, models 
using different versions of Equations 3, 6, and 7 to estimate u(t). CONUS-wide evaluation 
using root-mean-square error (RMSE) from two versions, the linear model without 


covariates and the nonlinear model with MERRA-2 precipitation, is shown in Figure 7. 
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Here and in subsequent calculations using CSGD error models, RMSE and other error 
metrics are computed between NLDAS-2 and the conditional CSGD median. As noted in 
Section 3.2, the conditional CSGD mean is always nonzero and greater than the median, 
which for low precipitation rates can be equal to zero. This means that neither the 
conditional mean nor median are ideal measures of the central tendency, but investigation 
of a more appropriate summary statistic is beyond the scope of this study. The linear model 
improves upon the TMPA dataset (i.e. reduces RMSE) except in the Rockies and Pacific 
coastal mountains, where performance is poor. The nonlinear model with MERRA-2 
precipitation offers further improvement, including in these mountainous areas. Reductions 
in RMSE are greatest in the northern part of the country (particularly the nonlinear model 
with MERRA-? precipitation) and in the high-altitude but lower-relief portions of the 
Intermountain West such as the upper Rio Grande in southern Colorado and northern New 


Mexico and the Snake River Plain in southern Idaho. 


The substantial improvements provided by the nonlinear model with MERRA-? covariates 
in the northeastern and northwestern parts of the country are likely attributable to the 
relatively higher proportion of stratiform precipitation in those regions, which is generally 
better estimated by atmospheric models than by satellite sensors. The more complex model 
also improves upon simpler versions in most of the rockies and west coast mountains. 
Visual inspection of results for a range of models reveal that most of this improvement 
stems from inclusion of MERRA-2, rather than from the nonlinear model structure (results 


not shown). Error reductions are associated with the identification and removal of 
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systematic errors and, in the case of models that include MERRA-2 covariates, some 


further reduction of random errors. 


We compute the RMSE and mean absolute error (MAE) normalized by the mean daily 
precipitation (henceforth refered to as NRMSE and NMAE, respectively) for each 0.25° 
grid cell across CONUS for a range of CSGD model configurations. This allows us to 
compare the relative reduction in errors achieved in various precipitation hydroclimates. 
Results are then summarized by computing the CONUS-wide median and interquartile 
range (IQR) of NRMSE and NMAE (Table 1). These nonparametric summary statistics 
were chosen rather than the mean and standard deviation because in arid parts of the 
country, normalizing by a daily mean precipitation close to zero can produce spurious 


results. 


The NRMSE and NMAE for the uncorrected TMPA dataset shows slightly increased 
accuracy for the validation period, relative to the training period, possibly associated with 
improvements in the number and quality of satellite sensors over the lifetime of TMPA. In 
contrast, the error statistics for the CSGD models tend be unchanged or slightly worse for 
the 2014 validation period, though in all cases the validation performance is within 7% of 
the reference period in terms of RMSE and within 5% in terms of MAE, suggesting 


relatively robust model performance. 


The linear (nonlinear) model improved median NRMSE by 20% (22%) and median NVAE 


by 17% (19%) for the training period, with similar performance in the validation period. 
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MERRA-? covariates improved upon this “baseline” CSGD model performance. The 
inclusion of MERRA-?2 precipitation offers robust improvements to both NRMSE and 
NMAE (32% and 33%, respectively in the case of the nonlinear version). Inclusion of 
MERRA-2 TPW alone (i.e. without MERRA-2 precipitation) offers very little 
improvement in both the linear and nonlinear models. When both MERRA-2 TPW and 
precipitation are included, neither linear nor nonlinear models show much improvement 
over when only the precipitation covariate is included. This implies that precipitation from 
MERRA-? is a much stronger predictor of true precipitation than TPW. It also suggests 


that MERRA-2 precipitation and TPW are highly correlated, which is unsurprising. 


A linear CSGD error model was tested in which the size of the TMPA and NLDAS-2 
samples at each grid cell were expanded by concatenating the data from the eight adjacent 
grid cells for model fitting. Referred to in Table 1 as “linear with spatial pooling,” this 
model produced similar results to the linear model fitted only to data from individual grid 
cells (“linear” in Table 1). This has several implications. In complex terrain or near water 
bodies, precipitation can vary over relatively short distances. In such cases, spatial pooling 
may create an enlarged sample that does not properly represent precipitation statistics in 
the grid cell in question. Visual inspection of RMSE maps show similar performance 
between pooled and unpooled linear CSGD models in the eastern portion of the country, 
and lower performance using pooling in the mountain west, consistent with this intuition 
(results not shown). In addition, the value added through spatial pooling is inherently 
limited if there is substantial spatial correlation in the precipitation estimates and errors 


between adjacent grid cells. The similar performance between pooled and unpooled models 
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in less varied terrain also implies that the model fitting procedure is relatively robust to 


small samples. 


We evaluate the relationships between errors in TMPA, as a function of correlation 
between TMPA and NLDAS-?2, before and after applying a nonlinear CSGD model with 
MERRA-? precipitation (Figure 8). The influence of land surface elevation, as a proxy for 
topographic relief, is also evaluated, since this impact is somewhat difficult to assess in 
Figure 7. Both the absolute values and the variability in NRMSE and NMAE are relatively 
low for locations with high correlation, while the variability (though not the central 
tendency) in these statistics increases for locations with lower correlation and there is a 
relatively weak inverse relationship between error magnitude and correlation between the 
SMPP and reference. Neither correlation nor elevation appear to be the primary controls 
on NRMSE or NMAE, even though correlation values for higher-elevation locations tend 
to be relatively low. It also appears from Figure 8 that similar reductions in NRMSE and 
NMAE can be achieved regardless of correlation or land surface elevation. Qualitatively 


similar results were produced with the simpler linear model (not shown). 


Like NRMSE and NMAE, correlation between the uncorrected TMPA and NLDAS-2 is 
slightly higher in the validation period than the training period, again likely associated with 
improvements in the quality and number of sensors. Interestingly, linear and nonparametric 
correlations between corrected SMPP timeseries and NLDAS-2 reduce somewhat when 
TMPA is fed through a linear CSGD model without covariates, and remain relatively 


unchanged when a nonlinear model is used instead (Table 2). This may be due to the 
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limitations of using either the CSGD mean or median and due to the implicit bias 
adjustment in the CSGD framework. When MERRA-2 precipitation is included as a 
covariate, however, correlation between the corrected SMPP timeseries and NLDAS-2 
increases. This highlights the ability of MERRA-?2 covariates (particularly precipitation) to 


reduce random errors in TMPA. 


We also examined the realtime version (TMPA-RT) with several CSGD models (Table 3). 
NRMSE and NMAE in the original TMPA-RT dataset are 14% larger in terms of NRMSE 
and 8% larger in terms of NMAE than the research version analyzed previously. Results 
are qualitatively similar to Table 1, with all CSGD models showing improvement over the 
uncorrected TMPA-RT dataset, and with the largest improvements coming from the 
nonlinear model with MERRA-2 precipitation. Likewise, error statistics are generally 
comparable for the 2014 validation period, showing minimal loss of performance as 
compared to the training period. The degree of error reduction achieved by the CSGD 
models is greater using TMPA-RT than TMPA. For example, relative to the uncorrected 
TMPA-RT, the linear CSGD model reduced NRMSE (NMAE) by 25% (20%), while the 
same model reduced error for the research version by 20% (17%). Reduction in NRMSE 
(NMAB) relative to the uncorrected TMPA-RT was as high as 39% (37%) for the nonlinear 
CSGD with MERRA-? precipitation. These results are consistent with the notion that error 
models identify and remove systematic biases, since Maggioni et al. (2016) reported higher 


systematic errors in TMPA-RT than the research version. 


4.3 | Parameter Sensitivity 
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The results for the validation period shown in Tables 1 and 2 provide an initial indication 
that the CSGD framework can be applied outside of the training period. To investigate this 
issue further, we re-estimate the CSGD parameters for NUDAS-2 and TMPA, as well as 
the regression parameters for linear version of the conditional CSGD model for each year 
individually from 1998-2013 and for successively longer time periods (i.e. 1998-1999, 
1998-2000, etc.) for the grid cell nearest to Charlotte, North Carolina (Figure 9). While 
parameters vary somewhat from year to year, estimates using longer time periods converge 
to relatively stable values after several years. Exceptions are the slight downward trend in 
a, and upward trend in a3. It is well known that the spatial and temporal statistical 
consistency of precipitation datasets vary according to input data availability, such as the 
number of rain gages (Hamlet and Lettenmaier, 2005) or the quality and type of satellite 
sensor (Cho and Chun, 2008). The trends in a, and a3 are consistent with improvement in 
precipitation estimation in TMPA (i.e. reduction in the weight given to the regression 
intercept and increase in weight given to R,). Parameters for the nonlinear model and for 


other locations are similarly stable over time (results not shown). 


These results suggest that the continuous ranked probability score-based parameter 
estimation procedure for the climatological CSGD and the conditional CSGD regression 
framework is relatively efficient with respect to data requirements, and that several years 
of coincident reference data may be sufficient. It would be worthwhile to evaluate this issue 
using error metrics such as RMSE or MAE. We leave this as a topic of future work, though 


it is worth noting that (Scheuerer and Hamill, 2015) found relatively poor conditional 


23 


516 


517 


518 


519 


520 


521 


S22 


523 


524 


525 


526 


D2 


528 


529 


530 


531 


32 


233 


534 


DD 


536 


537 


CSGD performance with a one year training sample but good performance with modest 


increases in training record length. 


3s Summary and Discussion 


Using the censored shifted gamma distribution (CSGD), we characterize the climatology 
of daily precipitation over CONUS of TMPA, a satellite multisensor precipitation product 
(SMPP) and NLDAS-2, a reference (i.e. rain gage-based) dataset. We also use a conditional 
CSGD error modeling framework to quantify and reduce errors in TMPA. The CSGD 
describes both precipitation occurrence and magnitude, and reveals significant differences 
between TMPA and NLDAS-2 including poor satellite-based estimation over inland water 
bodies and mountainous regions. The CSGD-based error modeling framework considers 
errors both in the detection and magnitude of precipitation and can model systematic bias 
either as a linear or nonlinear function of precipitation rate. Both versions perform better 
than an existing error model from Maggioni et al. (2014) over a wide range of precipitation 


magnitudes for daily precipitation. 


The framework suffers most in areas of high topographic relief (though not necessarily in 
areas of high elevation). Error reduction at a specific location depends on the relative 
balance of systematic and random error in the SMPP at that location. Preliminary analyses 
demonstrate that parameter estimation of both the CSGD and the CSGD-based error 
framework are relatively insensitive to record length for periods of record longer than 


several years. 
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In addition, we show that errors in TMPA can be reduced by incorporating covariates from 
MERRA-? atmospheric reanalysis, despite its relatively coarse resolution. This is the first 
study that we are aware of in which the potential benefits of merging numerical weather 
prediction and SMPP is explored quantitatively. Precipitation from MERRA-2 offers 
robust increases in performance, particularly in mountainous areas, while MERRA-2 
precipitable water provides little improvement. The improvements offered by MERRA-2 
appear to be due to the better performance of numerical models relative to satellite-based 
instruments, in resolving stratiform precipitation. Other numerical weather models that 
have higher resolution or that assimilate more independent observations would likely 


provide additional improvement. 


It should be emphasized that precipitation error models can only isolate and thus remove 
systematic errors. The errors remaining after the removal of systematic bias, 1.e. the random 
errors, can be described statistically but not reduced or eliminated. The variability in these 
residuals can only be explained via the inclusion of additional information. Except for 
models that include MERRA-2 covariates, therefore, the error reductions shown 
throughout Section 4.2 stem solely from the identification and removal of systematic 
errors. MERRA-2 covariates can explain some amount of residual (i.e. random) error, as 


evidenced by the further reductions in errors and increased correlations. 


The error reduction achieved in this study is generally consistent with the levels of 


systematic error found over the eastern United States at the same spatial and temporal 


resolution by Maggioni et al. (2016), though more work is needed to reconcile 
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discrepancies between the degree of systematic bias shown here and shown by those 
authors in the mountain west. Also consistent with (Maggioni et al., 2016), improvements 
in TMPA-RT were relatively greater than for the gage-corrected non-realtime version, 
suggesting that the CSGD approach has particular advantages for near-realtime 
applications. The CSGD approach, coupled with realtime numerical weather prediction 
estimates such as those generated using NASA’s GEOS-5 (Rienecker et al., 2008), offer a 
pathway to improve the accuracy of near-realtime SMPP, and for parameterizing remaining 


random errors. 


Certain relevant issues were not explored in this study. Maggioni et al. (2014) concluded 
that seasonally varying model parameters offered no major advantage in their error model, 
and our initial investigations into seasonality, which are omitted here in the interest of 
brevity, confirm this. Errors in the NLDAS-2 reference data, including due to rain gage 
undercatch, were not considered and can be significant, particularly in the cold season and 


in steep terrain. 


Many applications, such as hydrologic modeling, can require subdaily precipitation inputs. 
SMPP errors in magnitude grow with increasing resolution. The autocorrelation of daily 
precipitation is relatively low, but increases as temporal resolution becomes finer. Thus, 
generating a realistic high-resolution timeseries of precipitation using the CSGD approach 
or other error models requires consideration of this autocorrelation. The same is true for 


generating spatially-correlated precipitation fields. 
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One key challenge with the CSGD framework, and precipitation error modeling more 
generally, is transferability to regions that lack reference data. This issue requires 
significant further effort, but several previous studies have shown promise (Gebregiorgis 
and Hossain, 2014, 2013). The CSGD framework would be strong candidate for such 
efforts, due to the relatively simple structure, robust performance, and the ability to include 
relevant atmospheric variables from numerical weather prediction, which may potentially 
be even more useful in data-limited settings. Resolving such issues would constitute a 
major step toward quantifying and reducing errors in satellite precipitation estimates and 


helping users to better understand the implications of remaining irreducible random errors. 
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Tables 


Table 1: Median of CONUS-wide NRMSE and NMAE for TMPA vs. NLDAS-? and for 
a range of CSGD error models. Values in parentheses give the interquartile range (IQR; 
i.e, 25""-75" percentiles). The models are fit to the 1998-2013 time period, while 2014 is 
feserved for valid avon cnscwwi deine Nagios ee Miguidneiunsieean neta Mu aeeuiiee 33 
Table 2: CONUS-wide median and IQR for Pearson and Spearman correlation 
coefficients for TMPA vs. NLDAS-2 and for a range of CSGD error models. The models 


are fit to the 1998-2013 period, while 2014 is reserved for validation. ............:ceeseeeees 34 
Table 3: As per Table 1, but using TMPA-RT and with a reduced set of CSGD error 
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792 
793 Table 1: Median of CONUS-wide NRMSE and NMAE for TMPA vs. NLDAS-2 and for 


794  arange of CSGD error models. Values in parentheses give the interquartile range (IQR; 
795 i.e, 25"-75" percentiles). The models are fit to the 1998-2013 time period, while 2014 is 


796 _ reserved for validation. 
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798 


799 


800 


801 


802 


NRMSE [-] NMAE [-] 
CSGD Error Model | 1998-2013 2014 1998-2013 2014 
2.73 2.54 0.98 0.92 
Uncorrected TMPA | (9 97 3.25) | (2.12, 3.14) | (0.87, 1.11) | (0.81, 1.05) 
ease 2.19 2.25 0.81 0.80 
(1.86, 2.74) | (1.89, 2.84) | (0.74, 0.89) | (0.73, 0.89) 
Linear with spatial 2.20 2.26 0.81 0.80 
pooling (1.87, 2.73) | (1.89, 2.85) | (0.74, 0.89) | (0.73, 0.89) 
iaaitaene 2.14 2.22 0.79 0.79 
(1.82, 2.70) | (1.84, 2.83) | (0.72, 0.88) | (0.72, 0.88) 
Linear with MERRA- 1.88 1.99 0.67 0.69 
2 precipitation (1.55, 2.36) | (1.60, 2.58) | (0.60, 0.75) | (0.61, 0.78) 
Linear with MERRA- 2.17 2.22 0.79 0.79 
2 TPW (1.83, 2.71) | (1.85, 2.82) | (0.72, 0.88) | (0.71, 0.88) 
Linear with MERRA- 1.87 1.98 0.67 0.68 
2 precipitation and (1.54, 2.36) | (1.59, 2.56) | (0.60, 0.75) | (0.61, 0.78) 
TPW 
Nonlinear with 1.85 1.97 0.66 0.69 
MERRA-2 (1.53, 2.33) | (1.58, 2.55) | (0.59, 0.74) | (0.61, 0.77) 
precipitation 
Nonlinear with 2.13 2.21 0.78 0.78 
MERRA-2 TPW (1.80, 2.69) | (1.82, 2.80) | (0.71, 0.87) | (0.71, 0.87) 
Nonlinear with 1.84 1.97 0.66 0.69 
MERRA-2 (1.52, 2.33) | (1.57, 2.55) | (0.58, 0.74) | (0.61, 0.78) 
precipitation and 
TPW 
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Table 2: CONUS-wide median and IQR for Pearson and Spearman correlation coefficients 


for TMPA vs. NLDAS-2 and for a range of CSGD error models. The models are fit to the 


1998-2013 period, while 2014 is reserved for validation. 


Pearson Correlation 


Spearman Correlation 


CSGD Error Model | 1998-2013 2014 1998-2013 2014 
0.65 0.67 0.53 0.58 
Uncorrected TMPA | (9 53 0.71) | (0.56, 0.74) | (0.44, 0.62) | (0.49, 0.65) 
ae 0.63 0.65 0.52 0.56 
(0.51, 0.69) | (0.53, 0.73) | (0.43, 0.60) | (0.46, 0.63) 
0.65 0.67 0.53 0.57 
Wonbnsat (0.53, 0.71) | (0.56, 0.74) | (0.44, 0.61) | (0.47, 0.63) 
Linear with MERRA- 0.74 0.75 0.70 0.72 
2 precipitation (0.68, 0.79) | (0.67, 0.81) | (0.62, 0.75) | (0.65, 0.78) 
Nonlinear with 0.75 0.76 0.71 0.73 
MERRA-2 (0.70, 0.80) | (0.68, 0.81) | (0.64, 0.76) | (0.66, 0.78) 
precipitation 


Table 3: As per Table 1, but using TMPA-RT and with a reduced set of CSGD error 


models. 
NRMSE [-] NMAE [-] 

CSGD Error Model | 1998-2013 2014 1998-2013 2014 

Uncorrected TMPA- 3.10 3.08 1.06 1.05 
RT (2.29, 4.24) | (2.37, 4.09) | (0.87, 1.38) (0.90, 1.29) 

nea 2.32 2.38 0.84 0.84 
(1.93, 3.00) | (1.96, 3.06) | (0.76, 0.94) (076, 0.93) 

Naniacse 2.25 2.32 0.82 0.83 
(1.87, 2.94) | (1.89, 3.01) | (0.73, 0.92) (0.74, 0.92) 

Linear with MERRA- 1.91 2.05 0.68 0.71 
2 precipitation (1.56, 2.49) | (1.63, 2.68) | (0.60, 0.78) (0.62, 0.81) 

Nonlinear with 1.88 2.01 0.67 0.70 
MERRA-2 (1.55, 2.44) | (1.60, 2.64) | (0.59, 0.76) (0.62, 0.80) 

precipitation 
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Figures 


Figure 1: CONUS study area land surface elevation (top) and mean annual precipitation 
Pron: NLDA S22: (OOO mT) 5 cae exrard azn ly, aah tel has daa tere aloes aaa tik ae Wade Aa es 36 
Figure 2: CDF for an arbitrary CSGD distribution. Note that the CDF fully describes both 
the probability of zero and non-zero precipitation, as well as precipitation intensity. ..... 36 
Figure 3: Top panel—empirical CDFs (markers) and CSGD theoretical CDFs (lines) for 
NLDAS-?2 and TMPA for Charlotte, North Carolina and Denver, Colorado. A log scale is 
used for rainfall to improve readability. Bottom panel—conditional CSGD theoretical 
CDFs generated using the linear model described in Section 3 for Rst= 2.5 and 25 mm/d. 


Baus cirsel Oe sea gh sas cclny Poetsuae wise eS cesetians stay nape ous utay Aue ah RGU VEN peaeds casey Saaleeae rept Vee esses as 37 
Figure 4: Climatological CSGD parameters pL, o, and 6 for the 1998-2013 period for 

NLDAS-? (left), TMPA (middle), and the difference (right). ........ ce ceeeesceesseeeteeeteeeeeee 38 
Figure 5: Probability of precipitation for the 1998-2013 period using NLDAS-2 (top) and 
EMPA (BOON) 3 cssczustces ties incasy eiocisswctaeasseysdeeh dea cagsavtasc dant dlasagshei xan Bae teedtayiaescasetieass 39 


Figure 6: Linear (top panels) and nonlinear (bottom panels) conditional CSGD models for 
the 0.25° grid cell nearest to Charlotte, North Carolina compared with observations and 
PUSH model for 1998-2013 training period (grey dots) and 2014 validation period 
(orange dots). The sample data and models are shown in the left and right panels but the 
axes are linear (left panels) and logarithmic (right panels)............cceescceseceeeeeeeeteeetseeees 40 
Figure 7: Top and middle panels—all-season RMSE for 1998-2013, computed relative to 
NLDAS-? reference: (a) research version of TMPA; (b) linear model; (c) nonlinear 
model with MERRA-2? precipitation. Bottom panels—percentage change in RMSE 
relative to TMPA results in panel (a): (d) linear model; (e) nonlinear model with 
MERRA-? precipitation. Inset values in parentheses are the means of all grid cells in 
CONU Succisiaidiactasustinns vepaicetsat Qa iaass dita tieesleramea bareesuns we nnaen ener eeeeeitee 4] 
Figure 8: NRMSE (top panels) and NMAE (bottom panels) as a function of Spearman 
correlation coefficient for every 0.25° in the CONUS study domain. Left panels show 
results for the TMPA dataset for 1998-2013; right panels show results for the nonlinear 
CSGD model with MERRA-? precipitation. Point colors indicate average land surface 
elevation: icthic" Orit Cells 1 .c.:cuzssaicsdaas eassaanddcanh. aes ceasan tices daecincbiaustundsaandaashela aldazcnedbonetuass 42 
Figure 9: Parameter estimates as a function of precipitation record length from 1998-2013 
for the 0.25° grid cell nearest to Charlotte, North Carolina. Top: CSGD for NLDAS-2; 
middle: CSGD for TMPA; bottom: regression parameters for linear model. Markers 
indicate parameter estimates based on that single year of data, while the lines indicate 
parameter estimates based on data from 1998 to that year. ...... ei eeceecceseceteeneeeteeeeeeeeeeaees 43 
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848 Figure 1: CONUS study area land surface elevation (top) and mean annual precipitation 


849 from NLDAS-2 (bottom). 
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851 Figure 2: CDF for an arbitrary CSGD distribution. Note that the CDF fully describes both 


852 — the probability of zero and non-zero precipitation, as well as precipitation intensity. 
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Figure 3: Top panel—empirical CDFs (markers) and CSGD theoretical CDFs (lines) for 
NLDAS-?2 and TMPA for Charlotte, North Carolina and Denver, Colorado. A log scale is 
used for rainfall to improve readability. Bottom panel—conditional CSGD theoretical 


CDFs generated using the linear model described in Section 3 for R,(t)=2.5 and 25 mm/d. 
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Figure 4: Climatological CSGD parameters uw, o, and 6 for the 1998-2013 period for 


NLDAS-? (left), TMPA (middle), and the difference (right). 
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869 Figure 5: Probability of precipitation for the 1998-2013 period using NLDAS-2 (top) and 


870 TMPA (bottom). 


39 


871 
872 


873 


874 


875 


876 


877 


878 


100 


Linear Model Linear Model 


80 5 


40 


0 20 40 60 80 100 


Reference Observation [mm/d] 


0 20 40 60 80 100 10° ' 10° 10! 102 


Satellite Observation [mm/d] 

wi F CSGD mean 

O 1998-2013 Training Period CSGD median PUSH “hit case” median 

© 2014 Validation Period — — = CSGD 95% spread = =— = PUSH “hit case” 95% spread 


Figure 6: Linear (top panels) and nonlinear (bottom panels) conditional CSGD models for 
the 0.25° grid cell nearest to Charlotte, North Carolina compared with observations and 
PUSH model for 1998-2013 training period (grey dots) and 2014 validation period (orange 
dots). The sample data and models are shown in the left and right panels but the axes are 


linear (left panels) and logarithmic (right panels). 
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Figure 7: Top and middle panels—all-season RMSE for 1998-2013, computed relative to 
NLDAS-?2 reference: (a) research version of TMPA; (b) linear model; (c) nonlinear model 
with MERRA-? precipitation. Bottom panels—percentage change in RMSE relative to 
TMPA results in panel (a): (d) linear model; (e) nonlinear model with MERRA-2 


precipitation. Inset values in parentheses are the means of all grid cells in CONUS. 
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Figure 8: NRMSE (top panels) and NMAE (bottom panels) as a function of Spearman 
correlation coefficient for every 0.25° in the CONUS study domain. Left panels show 
results for the TMPA dataset for 1998-2013; right panels show results for the nonlinear 
CSGD model with MERRA-? precipitation. Point colors indicate average land surface 


elevation in the grid cell. 
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Figure 9: Parameter estimates as a function of precipitation record length from 1998-2013 
for the 0.25° grid cell nearest to Charlotte, North Carolina. Top: CSGD for NLDAS-2; 
middle: CSGD for TMPA. Bottom: regression parameters for linear model. Markers 
indicate parameter estimates based on that individual year of data, while the lines indicate 


parameter estimates based on data from 1998 to that year. 
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